Tagged with

computer vision

Explore machine learning papers and reviews related to computer vision. Find insights, analysis, and implementation details.

Papers Found

Back to all papers

Papers Related to computer vision

2023

Visual Instruction Tuning

Large Language Models Computer Vision Multimodal Learning Instruction Tuning Deep Learning

LLaVA paper: align LLMs with visual information through instruction tuning on image-text pairs, enabling multimodal understanding and reasoning.

Read review Original Paper

2022

Exploring Plain Vision Transformer Backbones for Object Detection

Transformers Computer Vision Object Detection Deep Learning

Investigating the effectiveness of plain Vision Transformers as backbones for object detection and proposing modifications to improve their performance.

Read review Original Paper

2016

You Only Look Once: Unified, Real-Time Object Detection

Object Detection Computer Vision Deep Learning Real-time YOLO

Introducing YOLO, a unified, real-time object detection system that frames object detection as a single regression problem.

Read review Original Paper

2019

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

Computer Vision Deep Learning Convolutional Neural Networks Model Scaling EfficientNet

Introducing EfficientNet, a family of convolutional neural networks that achieve state-of-the-art accuracy with significantly improved efficiency through a novel compound scaling method.

Read review Original Paper

2015

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Object Detection Computer Vision Deep Learning Region Proposal Network R-CNN Faster R-CNN

Faster R-CNN explained: how Region Proposal Networks (RPN) enable near real-time object detection with shared convolutional features.

Read review Original Paper

2023

Segment Anything

Computer Vision Image Segmentation Deep Learning SAM Prompt Engineering Zero-Shot Learning

Introducing SAM (Segment Anything), a promptable segmentation model capable of segmenting any object in an image with a wide range of prompts, including points, boxes, and text.

Read review Original Paper

2020

End-to-End Object Detection with Transformers

Transformers Computer Vision Object Detection Deep Learning DETR

Introducing DETR, a novel end-to-end object detection framework that leverages Transformers to directly predict a set of object bounding boxes.

Read review Original Paper

2023

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

Computer Vision Natural Language Processing Deep Learning Multimodal Learning BLIP-2 Vision-Language Models

Introducing BLIP-2, a new vision-language model that leverages frozen image encoders and large language models to achieve improved efficiency and performance in various multimodal tasks.

Read review Original Paper

2021

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Transformers Computer Vision Image Recognition Deep Learning

Vision Transformer (ViT) explained: how splitting images into 16x16 patches enables pure transformer architecture for state-of-the-art image recognition.

Read review Original Paper

2006

SURF: Speeded Up Robust Features

Computer Vision Feature Detection Feature Description Interest Point Detection SURF

Introducing SURF (Speeded Up Robust Features), a fast and robust algorithm for local feature detection and description, often used in applications like object recognition, image registration, and 3D reconstruction.

Read review Original Paper

2021

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Transformers Computer Vision Image Classification Object Detection Semantic Segmentation Deep Learning

Swin Transformer: hierarchical Vision Transformer using shifted windows for efficient image classification, object detection, and segmentation.

Read review Original Paper

2021

Learning Transferable Visual Models From Natural Language Supervision

Computer Vision Natural Language Processing Deep Learning Multimodal Learning CLIP

CLIP explained: contrastive learning on 400M image-text pairs enables zero-shot image classification and powerful vision-language understanding.

Read review Original Paper

2015

Deep Residual Learning for Image Recognition

Computer Vision CNN ResNet Deep Learning

ResNet analysis: how skip connections and residual learning solved the degradation problem, enabling training of 100+ layer neural networks.

Read review Original Paper